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(54) Method of replication-based garbage collection in a multiprocessor system 



(57) Improved method of replication-based garbage 
collection in a multiprocessing system comprising a plu- 
rality of processors, a memory divided into a current 
area (f rom-space) used by the processors during cur- 
rent program execution and a reserved area (to-space), 
and at least a garbage collector for performing, when 
necessary, a garbage collection consisting in flipping 
the roles of the current area and reserved area after all 
the live objects stored in current area have been copied 
into the reserved area and for reclaiming the current 
area after the flipping operation. Several program 
threads (mutators) are currently running in parallel and 
the garbage collector performs the garbage collection in 
parallel with the program threads, the flipping operation 
being performed after the program threads have been 
stopped and the garbage collection has been com- 
pleted. The method comprises the steps of storing, dur- 
ing normal program execution, a record in a local buffer 
allocated to each program thread each time this one 
updates a memory location, and adding this local buffer 
when full to a global list of buffers using a first wait-free 
synchronization operation, and, during garbage collec- 
tion, removing the local buffers one by one from the glo- 
bal list of buffers using a second wait-free 
synchronization operation, and looping over records in 
each removed local buffer and copying the updated 
memory locations into the reserved area until the global 
list is empty. 
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Description 
Technical field 

[0001] The present invention relates generally to a 
technique for automatically reclaiming the memory 
space which is occupied by data objects referred as gar- 
bage that the running program will not access any 
longer and relates particularly to a method of replica- 
tion-based garbage collection in a multiprocessor envi : 
ronment. 

Background 

[0002] Garbage collection is the automatic reclama- 
tion of computer storage. While in many systems pro- 
grammers must explicitly reclaim heap memory at some 
point in the program, by using a ( < free ) > or < < dispose 
)) statement, garbage collected systems free the pro- 
grammer from this burden. The garbage collector's 
function is to find data objects that are no longer in use 
and make their space available for reuse by the running 
program. An object is considered garbage, and subject 
to reclamation, if it is not reachable by the running pro- 
gram via any path of pointer traversals. Live (potentially 
reachable) objects are preserved by the collector, 
ensuring that the program can never traverse a ( < dan- 
gling pointer )) into a deallocated object. 
[0003] The basic functioning of a garbage collector 
consists, abstractly speaking, of two parts : 

1 . Distinguishing the live objects from the garbage 
in some way, or garbage detection, and 

2. Reclaiming the garbage objects' storage, so that 
the running program can use it. 

[0004] In practice, these two phases may be function- 
ally or temporally interleaved, and the reclamation tech- 
nique is strongly dependent on the garbage detection 
technique. 

[0005] In general, the garbage collectors use a < < live- 
ness )) criterion that is somewhat more conservative 
than those used by other systems. This criterion is 
defined in terms of a root set and reachability from 
these roots. At the point when garbage collection 
occurs, all globally visible variables of active procedures 
are considered live, and so are the local variables of any 
active procedures. The root set therefore consists of the 
global variables, local variables in the activation stack, 
and any registers used by active procedures. Heap 
objects directly reachable from any of these variables 
could be accessed by the running program, so they 
must be preserved. In addition, since the program might 
traverse pointers from those objects to reach other 
objects, any object reachable from a live object is also 
live. Thus, the set of live objects is simply the set of 
objects on any directed path of pointers from the roots. 
[0006] Any object that is not reachable from the root 



set is garbage, i.e., useless, because there is no legal 
sequence of program actions that would allow the pro- 
gram to reach that object. Garbage objects therefore 
cannot affect the future course of the computation, and 

5 their space may be safely reclai med. 

[0007] Given the basic two-part operation of a gar- 
bage collector, several variations are possible. The first 
part, that is distinguishing live objects from garbage, 
may be done by several methods. Among them, copying 

10 garbage collection does not really collect garbage. 
Rather, it moves all of the live objects into one area of 
the heap (space in the memory where all objects are 
held) whereas the area of reclaimed objects can be 
reused for new objects. 

is [0008] A very common kind of copying garbage collec- 
tion is the semi-space collector. In this scheme, the 
space devoted to the heap is subdivided into two parts, 
a current area or from-space and a reserved area or to- 
space. During normal program execution, only the from- 

20 space is in use. When the running program requests an 
allocation that will not fit in the unused area of the from- 
space, the program is stopped and the copying garbage 
collector is called to reclaim space. The roles of the cur- 
rent area and reserved area are flipped, that is all the 

25 live data are copied from the from-space to the to- 
space. 

[0009] Once the copying is completed, the to-space is 
made the current area and program execution is 
resumed. Thus, the roles of the two spaces are 

30 reversed each time the garbage collector is invoked. 
[0010] The technique of replication-based garbage 
collection is to let the collector work in parallel to the 
program threads or mutators. In contrast to previous 
copying garbage collection algorithms, replication- 

35 based garbage collection delays the flip until the end of 
the collection cycle. While the mutators keep running 
and operate on from-space, the collector replicates the 
live objects from the from-space to the to-space. Finally, 
in the flip stage, the mutators are stopped and then 

40 roots are updated to point to the replicated objects in the 
to-space. 

[001 1 ] But, while the replication is executed, objects in 
from-space keep on changing and this has to be 
reflected in the to-space replica. In order to make the 

45 replica consistent, the mutators log all modifications to a 
mutation log. The collector flips after it has cleared the 
mutation log, that is applied each update on the replica. 
Really, the collector stops the mutator threads for a 
short pause during which the collector updates the 

so mutator roots, and then flips the roles of from-space and 
to-space. 

[001 2] However, the above replication-based garbage 
collection is not suitable for a modern multiprocessor 
system wherein it is not guaranteed that the operations 
55 executed by one processor always appear in the same 
order in the view of another processor. Thus, it is possi- 
ble that the collector will see the update of a location 
only after it reads the update to the mutation log. From 
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the collector standpoint, this means that it might copy 
the contents of the location before the new value actu- 
ally appears in its view. As a consequence, the new rep- 
lica in to-space will contain an outdated value of the 
location which, furthermore, will never be updated. 

Summary of the Invention 

[001 3] Accordingly, the main object of the invention is 
to provide a new method of replication-based garbage 
collection which can be run in a multiprocessor system 
without the risk that the contents of memory locations 
are replicated from the current area to the reserved area 
while their updates have not been taken into considera- 
tion. 

[001 4] Therefore, the invention relates to an improved 
method of replication-based garbage collection in a 
multiprocessing system comprising a plurality of proc- 
essors, a memory divided into a current area (from- 
space) used by the processors during current program 
execution and a reserved area (to-space), and at least 
one garbage collector for performing, when necessary, 
a garbage collection consisting in flipping the roles of 
the current area and reserved area after all the live 
objects stored in current area have been copied into the 
reserved area and for reclaiming the current area after 
the flipping operation. Several program threads (muta- 
tors) are currently running in parallel and the garbage 
collector performs the garbage collection in parallel with 
the program threads, the flipping operation being per- 
formed after the program threads have been stopped 
and the garbage collection has been completed. The 
method of replication-based garbage collection com- 
prises the steps of storing, during normal program exe- 
cution, a record in a local buffer allocated to each 
program thread each time this program thread updates 
a memory location, and adding this local buffer when full 
to a global list of buffers using a first synchronization 
operation, and, during garbage collection, removing the 
local buffers one by one from the global list of buffers 
using a second synchronization operation, and looping 
over records in each removed local buffer and copying 
the updated memory locations into the reserved area 
until the global list is empty. 

Brief description of the drawings 

[001 5] The objects, characteristics and advantages of 
the invention will become clear from the following 
description given in reference to the accompanying 
drawings wherein : 

Fig. 1 represents a schematic block<liagram of the 
local buffers associated with each processor in a 
multiprocessor system and the global list of the 
local buffers according to the method of the inven- 
tion. 



Fig. 2 is a flow chart representing the steps of 
updating memory locations and storing records in a 
local buffer by a mutator. 

5 Fig. 3 is a flow chart representing the different steps 
under the control of the collector during a collection 
cycle according to the method of the invention. 

Rg. 4 is a flow chart representing the different steps 
10 performed by the collector for looping over records 
in a local buffer during a collection cycle according 
to the method of the invention. 

Detailed description of the invention 

15 

[001 6] Referring to Figure 1 , the principle of the inven- 
tion is to associate a local buffer 10, 12 or 14 respec- 
tively to each one of the program threads 16, 18 or 20 
running in parallel. This local buffer is used by the pro- 

20 gram thread to store all its mutation records rather than 
storing them directly into the mutation log, as in the pre- 
vious methods for replication based garbage collection. 
Once a local buffer 1 0, 1 2 or 1 4 is filled with records, the 
mutator adds a pointer to a global list 22 which is an 

25 array of pointers. Adding a pointer to global list 22 is 
done using a synchronization operation as explained 
below. When the garbage collection is performed, the 
collector removes the pointers from the global list one 
by one using the same synchronization mechanism, 

30 and performs the needed updates on the replica as dic- 
tated by the buffer records. 

[001 7] Note that processes could be used instead of 
program threads to implement the invention. Such a 
process contains an address space and several 

35 threads, one of them being the main thread. Each 
thread (sometimes called thread of control) has its own 
stack, registers and program counter. All threads share 
the memory space of the process. When a process with 
no threads is run, all the properties of a thread become 

40 the properties of the process. 

[001 8J The different steps of the method according to 
the invention are detailed in Figures 2, 3 and 4. As illus- 
trated in Figure 2, after starting (30) the collection cycle, 
the program thread also called mutator updates a mem- 

45 ory location (changes its contents) (32). The record of 
this update is stored in the associated local buffer (34). 
It is then determined whether the local buffer is full (36). 
If not, the updating operation is ended (38). If so, a 
memory coherence synchronization is first performed 

so (39). Then, the local buffer is added to the global list of 
buffers using synchronization such as a wait-free syn- 
chronization or any other appropriate synchronization 
as it is well known to those skilled in the art since the 
global list is shared by ail mutators and the collector 

55 (40). After the synchronization, a new local buffer is allo- 
cated to the mutator (41). 

[001 9] Note that the memory coherence synchroniza- 
tion is required in view of the ( < partial memory coher- 
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ence )) meaning that when a program thread on one 
processor performs several write operations, the order 
of the updates may be different for a second program 
thread running on a different processor. When this inco- 
herence endangers the correctness of a concurrent pro- 
gram, the programmer must make sure that such an 
inconsistency does not occur at the sensitive spots in 
the program. Each platform offers its special instruction 
to settle the memory coherence. These instructions 
mean that any update operation that is performed 
before the memory coherence synchronization instruc- 
tion is perceived by all threads as occurring before any 
update performed after the memory coherence syn- 
chronization. 

[0020] Note also that, although any appropriate syn- 
chronization could be used for adding a local buffer to 
global list, a wait-free synchronization is preferable. 
Indeed, a wait-free synchronization operation is per- 
formed by a synchronization mechanism that works in a 
{ ( wait-free > ) manner, that is without blocking the com- 
puter that uses the instruction. Such an operation can 
be a compare and swap instruction including three 
parameters : address, compared-value and new-value. 
If the memory value for a given address matches the 
given compared-value, then the new-value is put into 
the location. The instruction returns a code indicating 
whether the comparison and setting were successful. 
The main feature of this instruction is that it is done 
atomicaliy. Namely, no parallel process can change the 
value at the same time that the compare and swap 
instruction is executed. After the failure of such an 
instruction, the process may decide whether to try again 
or to execute another code after the failure. Conversely 
to a wait-free synchronization, a blocking synchroniza- 
tion is a synchronization which keeps the processor 
blocked until a certain event happens. Thus, with a 
blocking synchronization, a processor performing a 
work fed by another processor may decide to wait until 
a record is written into the shared list of records when 
this list is empty. In the invention, the wait-free synchro- 
nization guarantees that, if more than one mutator is 
modifying the global list by adding a local buffer, then 
the global list will not be corrupted and the changes will 
be reflected properly in the list. It must be noted that, on 
some platforms, a wait-free synchronization causes an 
implicit memory coherence synchronization. In such a 
case, only the wait-free synchronization of step 40 in 
Figure 2 is needed and step 39 does not exist. 
[0021 ] In parallel with the recording of location updat- 
ing by the mutators, the collector starts (42) buffer read- 
ing cycle as illustrated in Figure 3. It is first determined 
whether the global list is empty (44). If not, a buffer is 
removed from the global list (46) using a wait-free syn- 
chronization operation identical to the synchronization 
operation used for adding the buffers to the global list. 
Then, the collector goes over all records in the buffer 
and copies the changed values into the memory to- 
space (48). 



[0022] In case the global list of local buffers is empty, 
the collector stops all mutators (50) for finishing the col- 
lection. It verifies again whether the global is empty (52) 
since one of the mutators may have added a buffer 

5 before stopping. If not, the collector performs again the 
operation of removing the buffers from the global list 
(54) and the operation of looping over records in the 
buffers to apply updates (56). When the global list has 
been emptied while the mutators have been stopped, 

10 the collector loops over all local buffers that have not yet 
been added to the global list (58). At last, the collector 
completes the collection cycle by performing the flip 
between the from-space and the to-space (60), acti- 
vates the mutators (62) and ends the collection cycle 

15 (64). 

[0023] The operation of looping over the records in a 
local buffer (steps 48 and 54 of Figure 3) is illustrated in 
Figure 4. After starting (70), the first address in the 
buffer is scanned (72). A location of the replica is deter- 

20 mined in the to-space and the contents of the updated 
location are copied from from-space into to-space (74). 
At this point, the collector determines whether the value 
copied in to-space is a pointer (76). If so, the referred-to 
objects are scanned (78) and the pointer to the object is 

25 updated to refer to the new copy (80). If not, it is deter- 
mined whether the scanned record is the last one (82) 
so that the process is ended (84). If not, the collector 
scans the next record address in the local buffer (86) 
and performs again the same process for this record. 

30 Note that the scanning operation means updating the 
references to from-space into references to to-space 
and copying the referenced objects from from-space to 
to-space if not yet copied, as it is well known to those 
skilled in the art. 

35 [0024] It must be noted that the synchronization 
mechanism which handles the access to the global list 
of local buffers is an essential feature of the invention 
useful in two ways. On one hand, it manages the queue 
of buffers which must handle parallel updates. On the 

40 other hand, this synchronization makes sure that, when 
the collector gets the buffer to work on, its view is 
updated to contain ail the memory modification reported 
by the records in this buffer. This is true since, when the 
mutator synchronizes to insert the buffer into the queue, 

45 its view already reflects all these modifications. When 
the collector later synchronizes to get this buffer, it gets 
updated with these modifications as required. 
[0025] Although the invention has been described in 
reference to a preferred embodiment, it is understood 

so that numerous changes may be resorted to by those 
skilled in the art without departing from the scope of the 
invention. Thus , it would be possible to use several col- 
lectors running in parallel rather than a single collector. 
In such a case the synchronization problems would 

55 depend very much on the specific way chosen to imple- 
ment the parallel collection. 
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Claims 

1. In a multiprocessing system comprising a plurality 
of processors, a memory divided into a current area 
(from-space) used by said processors during cur- 
rent program execution and a reserved area (to- 
space), and at least one garbage collector for per- 
forming when necessary a garbage collection con- 
sisting in flipping the roles of said current area and 
reserved area after all the live objects stored in said 
current area have been copied into said reserved 
area and for reclaiming said current area after the 
flipping operation, and wherein several program 
threads (mutators) or the like are currently running 
in parallel and said garbage collector performs said 
garbage collection in parallel with said program 
threads, the flipping operation being performed 
after said program threads have been stopped and 
said garbage collection has been completed ; 

an improved method of replication-based gar- 
bage collection comprising the following steps : 

• during normal program execution, each 
program thread stores a record in a local 
buffer allocated thereto each time said pro- 
gram thread updates a memory location, 
and adds said local buffer when full to a 
global list of buffers using a first synchroni- 
zation operation, and 

• during garbage collection, said collector 
removes the local buffers one by one from 
said global list of buffers using a second 
synchronization operation, and loops over 
records in each removed local buffer and 
copies the updated memory locations into 
said reserved area until said global list is 
empty. 

2. The method according to claim 1 , wherein said syn- 
chronization operation is an instruction of wait-free 
synchronization performed without blocking said 
program thread or said collector which initializes 
such an instruction whatever the result of said 
instruction. 

3. The method according to claim 2, wherein said 
wait-free synchronization instruction is of the type 
( ( compare and swap ) > instruction. 

4. The method according to claim 1, 2 or 3, further 
comprising the following steps after said program 
threads have been stopped 

determining whether said global list contains 
other buffers which have been added to the 
global list during the removing of buffers by 
said collector, and 



If said global list is not empty, removing by said 
collector the new added buffers one by one 
from said global list, and looping over the 
records in each removed buffer and copying 
5 the updated memory locations into said 

reserved area until said global list is empty. 

5. The method according to claim 4, wherein said step 
of looping over records and copying the updated 

w memory in said reserved area is also performed 
with all local buffers allocated to said program 
threads after said global list has been emptied. 

6. The method according to any one of the preceding 
is claims, wherein said step of looping over records in 

a local buffer consists in copying the contents of 
locations which have been updated from said cur- 
rent area to said reserved area. 

20 7. The method according to claim 6, further compris- 
ing the step of determining whether the value cop- 
ied in said reserved area is a pointer and if so, 
scanning the referred-to objects and updating said 
pointer. 



Patent nrovirlftd hv Snnhnift Minn PI I C. - httrv//www siinhnip mm 



EP 0 969 377 A1 




Patent nrnviHp.rl hv Sunhriift Minn PI 1 C. - httn7/www Riinhmp rnm 



EP 0 969 377 A1 



30 





START 




ADDRESS 
UPDATING 



ADD TO THE 
LOCAL BUFFER 



-32 



.34 



36 



IS 

THE LOCAL 
BUFFER FULL. 



no 



yes 



8 



END 



PERFORM A 
MEMORY COHERENCE 
SYNCHRONIZATION 



ADD THE LOCAL 
BUFFER TO THE 

GLOBAL LIST USING 
WAIT-FREE 

SYNCHRONIZATION 



ALLOCATE A 
NEW LOCAL 
BUFFER 

i 



FIG. 2 



PatRnt nrnvtHftd hv Riinhma Minn PI I C. - httrW/www snnhnift mm 



EP 0 969 377 A1 





REMOVE A 
BUFFER FROM 
THE GLOBAL LIST 
USING WAIT-FREE 
SYNCHRONIZATION 



LOOP OVER 
RECORDS IN 
THE BUFFER TO 
APPLY UPDATES 



•48 



REMOVE A 
BUFFER FROM 
THE GLOBAL LIST 
USING WAIT-FREE 
SYNCHRONIZATION 



-58 



LOOP OVER 
LOCAL BUFFERS 




FLIP 



LOOP OVER 
RECORDS IN 
THE BUFFER TO 
APPLY UPDATES 



-60 



^62 



ACTIVATE 
MUTATORS 



END 




64 



FIG. 3 



Patent nroviriprl hv SiinhniA Minn PI I C - httn*//www sunhnip mm 



EP 0 969 377 A1 



.70 



START 



72 



SCAN OBJECT 
REFERENCED 
BY POINTER 



UPDATE THE 
POINTER 



80- 



FIRST ADDRESS 
IN THE BUFFER 



LOCATION OF THE 
REPLICA IN THE 
TO-SPACE 



78 



-74 





-86 



NEXT RECORD 
ADDRESS IN 
THE BUFFER 



FIG. 4 



Patp.nt nrovirlpd hv RnnhniP. Minn PI I C - httn7Awww snnhmn mm 



EP 0 969 377 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 98 48 0045 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (lnt.CI.6) 



0' TOOLE J ET AL: "CONCURRENT REPLICATING 
GARBAGE COLLECTION" 

PROCEEDINGS OF THE CONFERENCE ON LISP AND 
FUNCTIONAL PROGRAMMING, ORLANDO, JUNE 27 - 
29, 1994,27 June 1994, pages 34-42, 
XP000522341 

ASSOCIATION FOR COMPUTING MACHINERY 

* page 34, paragraph 2 - page 36, 
left-hand column, line 4 * 

HERLIHY M P ET AL: "LOCK-FREE GARBAGE 
COLLECTION FOR MULTIPROCESSORS" 
IEEE TRANSACTIONS ON PARALLEL AND 
DISTRIBUTED SYSTEMS, 

vol. 3, no. 3, 1 May 1992, pages 304-311, 
XP000274361 

* page 304, paragraph I - page 307, 
left-hand column, line 5 * 

0' TOOLE J ET AL: "CONCURRENT COMPACTING 
GARBAGE COLLECTION OF A PERSISTENT HEAP" 
OPERATING SYSTEMS REVIEW (SIGOPS), 
vol. 27, no. 5, 1 December 1993, pages 
161-174, XP000418691 

* page 163, paragraph 3.1 - page 164, 
left-hand column, paragraph 3.3 * 



1-7 



G06F 12/02 



1-3 



1-7 



TECHNICAL FIELDS 
SEARCHED <lnt.CI.S) 



G06F 



The present search report has been drawn up for all claims 



Place of Mtich 

THE HAGUE 



Date of completion of the search 

25 November 1998 



Examiner 

Nielsen, 0 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly relevant tf combined with another 

document of the same category 
A : technological background 
0 : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 



& : member of the same patent family, corresponding 
document 



Patent nrnvirter! hv Snnhnifi Minn PI I C. - hrtrv//www sunhnip mm 



